skip to main content


Search for: All records

Creators/Authors contains: "Mustafaraj, Eni"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In the past decade, a number of sophisticated AI-powered systems and tools have been developed and released to the scientific community and the public. These technical developments have occurred against a backdrop of political and social upheaval that is both magnifying and magnified by public health and macroeconomic crises. These technical and socio-political changes offer multiple lenses to contextualize (or distort) scientific reflexivity. Further, to computational social scientists who study computer-mediated human behavior, they have implications on what we study and how we study it. How should the ICWSM community engage with this changing world? Which disruptions should we embrace, and which ones should we resist? Whom do we ally with, and for what purpose? In this workshop co-located with ICWSM, we invited experience-based perspectives on these questions with the intent of drafting a collective research agenda for the computational social science community. We did so via the facilitation of collaborative position papers and the discussion of imminent challenges we face in the context of, for example, proprietary large language models, an increasingly unwieldy peer review process, and growing issues in data collection and access. This document presents a summary of the contributions and discussions in the workshop. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  2. How do Google Search results change following an impactful real-world event, such as the U.S. Supreme Court decision on June 24, 2022 to overturn Roe v. Wade? And what do they tell us about the nature of event-driven content, generated by various participants in the online information environment? In this paper, we present a dataset of more than 1.74 million Google Search results pages collected between June 24 and July 17, 2022, intended to capture what Google Search surfaced in response to queries about this event of national importance. These search pages were collected for 65 locations in 13 U.S. states, a mix of red, blue, and purple states, with respect to their voting patterns. We describe the process of building a set of circa 1,700 phrases used for searching Google, how we gathered the search results for each location, and how these results were parsed to extract information about the most frequently encountered web domains. We believe that this dataset, which comprises raw data (search results as HTML files) and processed data (extracted links organized as CSV files) can be used to answer research questions that are of interest to computational social scientists as well as communication and media studies scholars. 
    more » « less
    Free, publicly-accessible full text available June 5, 2024
  3. This study assesses the awareness and perceived utility of two features Google Search introduced in February 2021: “About this result” and “More about this page”. Google stated that the goal of these features is to help users vet unfamiliar web domains (or sources). We investigated whether the features were sufficiently prominent to be detected by frequent users of Google Search, and their perceived utility for making credibility judgments of sources, in one-on-one user studies with 25 undergraduate college students, who identify as frequent users of Google Search. Our results indicate a lack of adoption or awareness of these features by our participants and neutral-positive perceptions of their utility in evaluating web sources. We also examined the perceived usefulness of nine other domain credibility signals collected from the W3C. 
    more » « less
  4. De Cristofaro, Emiliano ; Nakov, Preslav (Ed.)
    Google’s reviewed claims feature was an early attempt to incorporate additional credibility signals from fact-checking onto the search results page. The feature, which appeared when users searched for the name of a subset of news publishers, was criticized by dozens of publishers for its errors and alleged anticonservative bias. By conducting an audit of news publisher search results and focusing on the critiques of publishers, we find that there is a lack of consensus between fact-checking ecosystem stakeholders that may be important to address in future iterations of public facing fact-checking tools. In particular, we find that a lack of transparency coupled with a lack of consensus on what makes a fact-check relevant to a news article led to the breakdown of reviewed claims. 
    more » « less
  5. Choosing the political party nominees, who will appear on the ballot for the US presidency, is a long process that starts two years before the general election. The news media plays a particular role in this process by continuously covering the state of the race. How can this news coverage be characterized? Given that there are thousands of news organizations, but each of us is exposed to only a few of them, we might be missing most of it. Online news aggregators, which aggregate news stories from a multitude of news sources and perspectives, could provide an important lens for the analysis. One such aggregator is Google’s Top stories, a recent addition to Google’s search result page. For the duration of 2019, we have collected the news headlines that Google Top stories has displayed for 30 candidates of both US political parties. Our dataset contains 79,903 news story URLs published by 2,168 unique news sources. Our analysis indicates that despite this large number of news sources, there is a very skewed distribution of where the Top stories are originating, with a very small number of sources contributing the majority of stories. We are sharing our dataset1 so that other researchers can answer questions related to algorithmic curation of news as well as media agenda setting in the context of political elections. 
    more » « less
  6. When one searches for political candidates on Google, a panel composed of recent news stories, known as Top stories, is commonly shown at the top of the search results page. These stories are selected by an algorithm that chooses from hundreds of thousands of articles published by thousands of news publishers. In our previous work, we identified 56 news sources that contributed 2/3 of all Top stories for 30 political candidates running in the primaries of 2020 US Presidential Election. In this paper, we survey US voters to elicit their familiarity and trust with these 56 news outlets. We find that some of the most frequent outlets are not familiar to all voters (e.g. The Hill or Politico), or particularly trusted by voters of any political stripes (e.g. Washington Examiner or The Daily Beast). Why then, are such sources shown so frequently in Top stories? We theorize that Google is sampling news articles from sources with different political leanings to offer a balanced coverage. This is reminiscent of the so-called “fairness doctrine” (1949-1987) policy in the United States that required broadcasters (radio or TV stations) to air contrasting views about controversial matters. Because there are fewer right-leaning publications than center or left-leaning ones, in order to maintain this “fair” balance, hyper-partisan far-right news sources of low trust receive more visibility than some news sources that are more familiar to and trusted by the public. 
    more » « less
  7. Search engines, by ranking a few links ahead of million others based on opaque rules, open themselves up to criticism of bias. Previous research has focused on measuring political bias of search engine algorithms to detect possible search engine manipulation effects on voters or unbalanced ideological representation in search results. Insofar that these concerns are related to the principle of fairness, this notion of fairness can be seen as explicitly oriented toward election candidates or political processes and only implicitly oriented toward the public at large. Thus, we ask the following research question: how should an auditing framework that is explicitly centered on the principle of ensuring and maximizing fairness for the public (i.e., voters) operate? To answer this question, we qualitatively explore four datasets about elections and politics in the United States: 1) a survey of eligible U.S. voters about their information needs ahead of the 2018 U.S. elections, 2) a dataset of biased political phrases used in a large-scale Google audit ahead of the 2018 U.S. election, 3) Google’s “related searches” phrases for two groups of political candidates in the 2018 U.S. election (one group is composed entirely of women), and 4) autocomplete suggestions and result pages for a set of searches on the day of a statewide election in the U.S. state of Virginia in 2019. We find that voters have much broader information needs than the search engine audit literature has accounted for in the past, and that relying on political science theories of voter modeling provides a good starting point for informing the design of voter-centered audits. 
    more » « less
  8. In an increasingly information-dense web, how do we ensure that we do not fall for unreliable information? To design better web literacy practices for assessing online information, we need to understand how people perceive the credibility of unfamiliar websites under time constraints. Would they be able to rate real news websites as more credible and fake news websites as less credible? We investigated this research question through an experimental study with 42 participants (mean age = 28.3) who were asked to rate the credibility of various “real news” (n = 14) and “fake news” (n = 14) websites under different time conditions (6s, 12s, 20s), and with a different advertising treatment (with or without ads). Participants did not visit the websites to make their credibility assessments; instead, they interacted with the images of website screen captures, which were modified to remove any mention of website names, to avoid the effect of name recognition. Participants rated the credibility of each website on a scale from 1 to 7 and in follow-up interviews provided justifications for their credibility scores. Through hypothesis testing, we find that participants, despite limited time exposure to each website (between 6 and 20 seconds), are quite good at the task of distinguishing between real and fake news websites, with real news websites being overall rated as more credible than fake news websites. Our results agree with the well-known theory of “first impressions” from psychology, that has established the human ability to infer character traits from faces. That is, participants can quickly infer meaningful visual and content cues from a website, that are helping them make the right credibility evaluation decision. 
    more » « less
  9. Algorithmic auditing has emerged as an important methodology that gleans insights from opaque platform algorithms. These audits often rely on the repeated observations of an algorithm’s outputs given a fixed set of inputs. For example, to audit Google search, one repeatedly inputs queries and captures the resulting search pages. Then, the goal is to uncover patterns in the data that reveal the “secrets” of algorithmic decision making. In this paper, we introduce one particular algorithm audit, that of Google’s Top stories. We describe the process of data collection, exploration, and analysis for this application and share some of the insights. Concretely, our analysis suggests that Google may be trying to burst the “filter bubble” by choosing less known publishers for the 3rd position in the Top stories. In addition to revealing the behavior of the platform, the audit also illustrated that a subset of publishers cover certain stories more than others. 
    more » « less